Previously, I have shown how to use -margins- after -ml-, for the linear regression model (under normality assumption), and for the probit model.
Today, I'll provide yet another example, where the goal is to estimate a two equation model. Namely, the infamous "IVprobit", or instrumental variable probit.
I call this infamous, because it has been subject of multiple threads on Statalist, because of the consistency (or lack there off) regarding estimated marginal effects. you can read some of the latest discussion here.
The bottom line is that through different Stata versions, margins produced different versions of "marginal effects" for this model. All of them correct, but correct from a computational point of view.
Based on my own assessment, corroborated by discussions with colleagues this is the summary of what Stata would do:
So, without further ado, let me present you my own version of how to estimate an ivprobit model, and how the predict program could be set up, and why we had so many types of marginal effects lurking around.
First thing first. Unless you already have this program saved somewhere in your accesible ado files (most likely the "ado/personal" folder), make sure to have the following program in memory.
. program adde, eclass 1. ereturn `0' 2. end
So, lets start.
The IVprobit model is a nonlinear model that can be used when your dependent variable is a binary variable (0 - 1), thus you want to estimate the "probability of success (y=1)" given a set of characteristics. \( P(y=1|X) \)., but you have some continuous and endogenous explanatory variable.
Econometrics 101. If a variable is endogenous, we cannot estimate the model and interpret the results as causal effects, because changes in the endogenous variable may happen at the same time as changes in unobserved components. Thus, if the outcome changes, we do not know if it is because the endogenous variable changed, or because the unobservables changed. (they are, after all, correlated).
In this case, one option we have to deal with the unobserved confounders is to use instruments to either isolate exogenous variation of the variable of interest (using the 2sls approach for example) or obtain an approximation of the endogenous component that we can control for (Control function approach).
IV-probit is in fact the application of the latter. A control function approach.
Formally, IVprobit model can be written as follows: $$y_2 = z_1 \delta_1 + z_2 \delta_2 + u_2 \quad (1) $$ $$y^*_1 = z_1 \beta_1 + y_2 \beta_2 + u_1 \quad (2) $$ $$y_1= 1(y^*_1>0)$$
where the errors \( (u_1,u_2) \) follow a bivariate normal distribution: $$ \begin{pmatrix} u_1 \\ u_2 \end{pmatrix} \sim Normal \begin{pmatrix} \begin{pmatrix} 0 \\ 0 \end{pmatrix}, \begin{pmatrix} 1 & \rho \sigma_2\\ \rho \sigma_2 & \sigma^2_2 \end{pmatrix} \end{pmatrix} $$ In this model \( z_1 \) is the set of exogenous variables that affect \( y^*_1 \), and \( z_2 \) are the instruments.
It is also easy to see that \( y_2 \) is endogenous, because: $$ if \quad corr(u_1,u_2) \neq 0 \quad and \quad corr(y_2,u_2) \neq 0 \rightarrow corr(y_2,u_1) \neq 0 $$
So how do we estimate this model? by parts!
First, equation (1) can be estimated directly, because it is a function of exogenous variables only. Thus, we could estimate that equation using standard OLS (as ivprobit-two-step does), or via MLE assuming the normality of the errors \( u_2 \).
The one that requires more attention is equation (2). We know that \( corr(y_2,u_1) \) is different from zero, which is the cause of the endogeneity of \( y_2 \). We could, however, "decompose" \( u_1 \), into two parts. One that contains the endogenous component, and one that is exogenous and uncorrelated with all other variables. First, recall that if \( (u_1,u_2) \) follows a bivariate normal distribution, then, conditional on \( u_2 \), \( u_1 \) will have the following distribution: $$ u_1 \sim N \left( \rho \frac{u_2}{\sigma_2}, \sqrt{1-\rho^2} \right) $$ This means that: $$ u_1 = \rho \frac{u_2}{\sigma_2} + v_1 \quad (3) $$ so that \( v_1 \) is uncorrelated with \( y_2 \). We can replace equation (3) into (2) which gives us: $$y^*_1 = z_1 \beta_1 + y_2 \beta_2 + \rho \frac{u_2}{\sigma_2} + v_1 (4) $$ This equation could now be estimated directly, assuming we observe \( u_2 \). However, to be estimated using a probit model, we need to rescale the equation so that the error \(v_1 \) has a variance of 1.
This can be done by simply dividing all terms by \( \sqrt(1-\rho^2) \), and estimating the following model using standard probit model. $$\frac{y^*_1}{\sqrt{1-\rho^2})} = z_1 \frac{\beta_1}{\sqrt{1-\rho^2}} + y_2 \frac{\beta_2}{\sqrt{1-\rho^2}} + \frac{\rho}{\sqrt{1-\rho^2}} \frac{u_2}{\sigma_2} + \frac{v_1}{\sqrt{1-\rho^2}} \quad (5) $$ or perhaps the simpler: $$y^{**}_1 = z_1 \beta^r_1 + y_2 \beta^r_2 + \theta \frac{u_2}{\sigma_2} + \nu_1 \quad (6) $$
So what is the difference between using one equation or the other?. I would argue none, as long as you know how to estimate the standard errors from the system.
So, how do we estimate this ivprobit model? I would say that there are at least three ways of doing so.
The first method is what is typically known as the two-step approach. In the first
step, one estimates equation (1) using, say, OLS. obtain the predicted residuals \( \hat{u}_2 \),
plug them in equation (6), which can be estimated using a simple probit model. Of course,
in doing so, one does not obtain estimates for the structural coefficients, but only the
"rescaled" ones.
The main difficulty of using this method is the correct estimation of the standard errors
of all coefficients.
Many textbooks would say it is a "simple" application of a delta method, or, use of bootstrap,
but the fact of the matter is that you need some way to account that \( \hat{u}_2 \) is
a variable that is measured with error when you estimate equation (5).
The second method is to estimate equations (1) and (4) simultaneously using full information maximum likelihood.
This imposes the assumption that the errors follow a bivariate normal distribution, and allows you to
obtain estimates for the structural parameters in equation (2), in addition to the "linked" parameters
\( \sigma_2 \) and \( \rho \).
Under this strategy, the contribution of a single observation to the likelihood function becomes:
$$ L_i = L^1_i * L^2_i \quad (7a)$$
$$ L^1_i=\phi \left( y_2,z_1 \delta_1 + z_2 \delta_2, \sigma_2 \right) \quad (7b)$$
$$ L^2_i=\Phi \left( \frac{ z_1 \beta_1 + y_2 \beta_2 + \rho \frac{y_2 -z_1 \delta_1 - z_2 \delta_2}{\sigma_2}}
{\sqrt{1-\rho^2}} \right) \quad if \quad y_1 ==1 (7c)$$
$$ L^2_i=1-\Phi \left( \frac{ z_1 \beta_1 + y_2 \beta_2 + \rho \frac{y_2 -z_1 \delta_1 - z_2 \delta_2}{\sigma_2}}
{\sqrt{1-\rho^2}} \right) \quad if \quad y_1 ==0 (7d)$$
Notice that instead of "plugging in" \( \hat{u}_2 \) in the second equation, we plug \( y_2 -z_1 \delta_1 - z_2 \delta_2 \).
This allows to implicitly account for the measurement errors of the first stage.
There is a third option, which I call "two-step-mle". I call it two-stage because the ivprobit will be estimated using equations (1) and (5). However, I call it MLE, because both equations are estimated simultaneously using MLE:
$$ L_i = L^1_i * L^2_i \quad (8a)$$
$$ L^1_i=\phi \left( y_2,z_1 \delta_1 + z_2 \delta_2, \sigma_2 \right) \quad (8b)$$
$$ L^2_i=\Phi \left( z_1 \beta^r_1 + y_2 \beta^r_2 + \theta \frac {y_2 -z_1 \delta_1 - z_2 \delta_2}{\sigma_2} \right) \quad (8c)$$
The main difference with the standard FIML, is that only rescaled coefficients are are estimated, and that the
link between both equation is not \( \rho \), but \( \theta \).
However, for this simplified example, both equations identify exactly the same model.
Compared to the usual two-step approach, however, because the model is estimated simultaneously, the
standard errors of all coefficients can be correctly estimated, without further calculations (no delta method nor bootstrap).
One last thing to notice in this model. First, there is a close relationship between \( \rho \) and \( \theta \):
$$\theta = \frac{\rho}{\sqrt{1-\rho ^2 }} \quad (9a)$$
$$\rho ^2 = \frac{\theta ^2 }{{1+\theta^2 }} \quad (9b)$$
and between the rescaled parameters (we will need this later on):
$$\beta_1 = \beta^r_1 * {\sqrt{1-\rho^2}} = \frac{\beta^r_1}{\sqrt{1+\theta^2}} \quad (10)$$
While I have shown that using FIML and two-step-ml will provide virtually the same results, I'll stick with the two-step approach,
as it allows me to derive marginal effects telling the story of what happened to -margins- through different Stata versions.
The following program defines this the log-likelihood function for the IV probit, using the two-step approach (equation 8), using the following
walk-through for the specification:
. capture program drop myivprobit_2sls . program myivprobit_2sls 1. args lnf xb theta zb lnsigma 2. qui { 3. local y1 $ML_y1 4. local y2 $ML_y2 5. local u2 (`y2'-`zb') 6. . tempvar xb_zb p1 p0 7. gen double `xb_zb'= `xb'+`theta'*((`u2')/exp(`lnsigma')) 8. gen double `p1' = normal( `xb_zb') 9. gen double `p0' = normal(-`xb_zb') 10. tempvar lnf1 lnf2 11. gen double `lnf1' = log(normalden(`y2', `zb', exp(`lnsigma'))) 12. gen double `lnf2' = log(`p1') if `y1'==1 13. replace `lnf2' = log(`p0') if `y1'==0 14. replace `lnf' = `lnf1' + `lnf2' 15. } 16. . end
So finally, the part that will be a bit more controversial. The prediction of the probability of success!.
The reason why this is controversial is because there are two candidates have been suggested to identify this expression.
The first candidate relates to the structural equation (2). Basically, if we can estimate the unscaled coefficients, the predicted outcome could be identified by:
$$ P(y_1=1| z_1 , y_2) = \Phi \left( z_1 \beta_1 + y_2 \beta_2 \right) $$
or if one prefers the version based on rescaled coefficients:
$$ P(y_1=1| z_1 , y_2) = \Phi \left( z_1 \beta^r_1 * {\sqrt{1-\rho^2}} + y_2 \beta^r_2 * {\sqrt{1-\rho^2}} \right) $$
And marginal effects can be obtained by analyzing this equation alone. Standard errors for this expression (at the corresponding
marginal effects) can be identified directly only if we estimate the the IVprobit
by FIML, or using the rescaled coefficients, making sure standard errors are calculated acounting for the estimation errors of the first stage.
The second option relates to estimate the marginal effects using equation (6):
$$ P(y_1=1| z_1 , y_2, \hat{u}_2 ) = \Phi \left( z_1 \beta^r_1 + y_2 \beta^r_2 + \theta \hat{u}_2 \right) $$
However, because \( \hat{u}_2 \) is never really observed, it is usually recommended to average (but not ignore) the impact of \( \hat{u}_2 \)
on the equation:
$$ P(y_1=1| z_1 , y_2) = E \left( \Phi \left( z_1 \beta^r_1 + y_2 \beta^r_2 + \theta \hat{u}_2 \right)| z_1, y_2 \right) $$
The bottom line: If one uses the two-step approach, marginal effects could be estimated assuming \( \hat{u}_2 \) is just another
exogenous variable in the model. The difficulty would be obtaining the correct estimation of standard errors. (specially because they will depend on
the measurement errors of \( \hat{u}_2 \).
Now, the FIML option is the most efficient, because standard errors of \( P(y_1=1|.) \) depend on \( \beta_r \)'s and \( \rho \).
However the two-step procedure depend on \( \beta_r's \), \( \rho \), and \( \hat{u}_2 \).
So Lets write these two options into a "predict" program.
. program myivprobit_p 1. syntax newvarname [if] [in] , [ pr1 pr2 *] 2. if "`pr1'`pr2'" =="" { 3. ml_p `0' 4. } 5. tokenize `e(depvar)' 6. local y1 `1' 7. local y2 `2' 8. marksample touse, novarlist 9. if "`pr1'" !="" { 10. tempvar xb zb theta lnsigma 11. _predict double `xb' , eq(#1) 12. _predict double `theta', eq(#2) 13. _predict double `zb' , eq(#3) 14. _predict double `lnsigma', eq(#4) 15. gen `typlist' `varlist' = normal(`xb'+`theta'*(`y2'-`zb')/exp(`lnsigma')) if `touse' 16. label var `varlist' "P(y=1|X) two-step" 17. } 18. else if "`pr2'"!="" { 19. tempvar xb zb theta lnsigma 20. _predict double `xb' , eq(#1) 21. _predict double `theta' , eq(#2) 22. gen `typlist' `varlist' = normal(`xb'/sqrt(1+`theta'^2)) if `touse' 23. label var `varlist' "P(y=1|X) FIML" 24. } 25. end
so, the first option -pr1- will estimate the predicted probability as if the model were estimated using the two step approach,
whereas the second will estimate the predicted probability based on the structural equation.
Alright, so lets estimate the model and compare the results with the built-in ivprobit command:
. clear . webuse laborsup . global y1 fem_work . global z1 fem_educ kids . global y2 other_inc . global z2 male_educ . . *Built in command: . ivprobit $y1 $z1 ($y2 = $z2), two Checking reduced-form model... Two-step probit with endogenous regressors Number of obs = 500 Wald chi2(3) = 93.97 Prob > chi2 = 0.0000 ------------------------------------------------------------------------------ | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- other_inc | -.058473 .0093364 -6.26 0.000 -.0767719 -.040174 fem_educ | .227437 .0281628 8.08 0.000 .1722389 .282635 kids | -.1961748 .0496323 -3.95 0.000 -.2934522 -.0988973 _cons | .3956061 .4982649 0.79 0.427 -.5809752 1.372187 ------------------------------------------------------------------------------ Instrumented: other_inc Instruments: fem_educ kids male_educ ------------------------------------------------------------------------------ Wald test of exogeneity: chi2(1) = 6.50 Prob > chi2 = 0.0108 . *my ivprobit two-step . ml model lf myivprobit_2sls ($y1 = $z1 $y2 ) (theta:) ($y2 = $z1 $z2 ) (lnsigma:) , /// > technique(nr bhhh) init(lnsigma:_cons = 2.81 ) maximize nolog . ml display Number of obs = 500 Wald chi2(3) = 94.01 Log likelihood = -2368.2062 Prob > chi2 = 0.0000 ------------------------------------------------------------------------------ | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- eq1 | fem_educ | .227437 .0281561 8.08 0.000 .172252 .282622 kids | -.1961748 .0496179 -3.95 0.000 -.2934242 -.0989254 other_inc | -.058473 .0093339 -6.26 0.000 -.0767671 -.0401788 _cons | .3956062 .4981099 0.79 0.427 -.5806714 1.371884 -------------+---------------------------------------------------------------- theta | _cons | .4008085 .1626174 2.46 0.014 .0820843 .7195328 -------------+---------------------------------------------------------------- eq3 | fem_educ | .3351866 .2825972 1.19 0.236 -.2186937 .889067 kids | .8329055 .5475666 1.52 0.128 -.2403052 1.906116 male_educ | 2.845253 .282746 10.06 0.000 2.291081 3.399425 _cons | 9.872562 5.029193 1.96 0.050 .015524 19.7296 -------------+---------------------------------------------------------------- lnsigma | _cons | 2.813383 .0316228 88.97 0.000 2.751404 2.875363 ------------------------------------------------------------------------------
you can see right away that except for differences attributed to rounding errors, and degrees of freedom
the results are virtually the same.
It is also reasuring to see that the results are also the same when we compared ivprobit-mle and
the rescaled coefficients:
. *FIML . ivprobit $y1 $z1 ($y2 = $z2), ml Fitting exogenous probit model Iteration 0: log likelihood = -344.63508 Iteration 1: log likelihood = -252.10819 Iteration 2: log likelihood = -252.04529 Iteration 3: log likelihood = -252.04529 Fitting full model Iteration 0: log likelihood = -2368.2142 Iteration 1: log likelihood = -2368.2062 Iteration 2: log likelihood = -2368.2062 Probit model with endogenous regressors Number of obs = 500 Wald chi2(3) = 163.88 Log likelihood = -2368.2062 Prob > chi2 = 0.0000 ---------------------------------------------------------------------------------------------- | Coef. Std. Err. z P>|z| [95% Conf. Interval] -----------------------------+---------------------------------------------------------------- other_inc | -.0542756 .0060854 -8.92 0.000 -.0662028 -.0423485 fem_educ | .211111 .0268648 7.86 0.000 .1584569 .2637651 kids | -.1820929 .0478267 -3.81 0.000 -.2758315 -.0883542 _cons | .3672086 .4480724 0.82 0.412 -.5109971 1.245414 -----------------------------+---------------------------------------------------------------- corr(e.other_inc,e.fem_work)| .3720375 .1300518 .0946562 .5958136 sd(e.other_inc)| 16.66621 .5270318 15.66461 17.73186 ---------------------------------------------------------------------------------------------- Instrumented: other_inc Instruments: fem_educ kids male_educ ---------------------------------------------------------------------------------------------- Wald test of exogeneity (corr = 0): chi2(1) = 6.70 Prob > chi2 = 0.0096 . est sto ivp . *my ivprobit two-step . ml model lf myivprobit_2sls ($y1 = $z1 $y2 ) (theta:) ($y2 = $z1 $z2 ) (lnsigma:) , /// > technique(nr bhhh) init(lnsigma:_cons = 2.81 ) maximize nolog . adde local predict myivprobit_p . est sto myivp . *with rescaled coefficients: . nlcom (other_inc: _b[other_inc]/sqrt(1+_b[theta:_cons]^2)) /// > (fem_educ: _b[fem_educ]/sqrt(1+_b[theta:_cons]^2)) /// > (kids: _b[kids]/sqrt(1+_b[theta:_cons]^2)) /// > (cons: _b[_cons]/sqrt(1+_b[theta:_cons]^2)) other_inc: _b[other_inc]/sqrt(1+_b[theta:_cons]^2) fem_educ: _b[fem_educ]/sqrt(1+_b[theta:_cons]^2) kids: _b[kids]/sqrt(1+_b[theta:_cons]^2) cons: _b[_cons]/sqrt(1+_b[theta:_cons]^2) ------------------------------------------------------------------------------ | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- other_inc | -.0542756 .0060854 -8.92 0.000 -.0662028 -.0423485 fem_educ | .211111 .0268648 7.86 0.000 .1584569 .2637651 kids | -.1820929 .0478267 -3.81 0.000 -.2758316 -.0883543 cons | .3672086 .4480724 0.82 0.412 -.5109971 1.245414 ------------------------------------------------------------------------------Where the results are exactly the same.
Let me now walk you through the Story of marginal effects with ivprobit.
Back in Stata 13, marginal effects for IV probit were estimated using the structural equation coeffients:
$$ P(y_1=1| z_1 , y_2) = \Phi \left( z_1 \beta_1 + y_2 \beta_2 \right) $$
So that marginal effects were defined as:
$$\frac{\partial P(y_1=1|.)}{\partial z_1} = \phi ( z_1 \beta_1 + y_2 \beta_2 ) \beta_1 $$
$$\frac{\partial P(y_1=1|.)}{\partial y_2} = \phi ( z_1 \beta_1 + y_2 \beta_2 ) \beta_2 $$
While I currently do not have access to Stata13,
this is what it would produce under version control from Stata 14.2:
. //margins, dydx(*) predict(pr) . // . //Average marginal effects Number of obs = 500 . //Model VCE : OIM . // . //Expression : Prob of positive outcome when rho=0, predict(pr) . //dy/dx w.r.t. : other_inc fem_educ kids male_educ . // . //------------------------------------------------------------------------------ . // | Delta-method . // | dy/dx Std. Err. z P>|z| [95% Conf. Interval] . //-------------+---------------------------------------------------------------- . // other_inc | -.014015 .0009836 -14.25 0.000 -.0159428 -.0120872 . // fem_educ | .0545129 .0066007 8.26 0.000 .0415758 .06745 . // kids | -.0470199 .0123397 -3.81 0.000 -.0712052 -.0228346 . // male_educ | 0 (omitted) . //------------------------------------------------------------------------------
This marginal effects are emulated using pr2 after myivprobit:
. est restore myivp (results myivp are active now) . margins, dydx(*) predict(pr2) force (note: prediction is a function of possibly stochastic quantities other than e(b)) Average marginal effects Number of obs = 500 Model VCE : OIM Expression : P(y=1|X) FIML, predict(pr2) dy/dx w.r.t. : fem_educ kids other_inc male_educ ------------------------------------------------------------------------------ | Delta-method | dy/dx Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- fem_educ | .0545129 .0066003 8.26 0.000 .0415766 .0674493 kids | -.0470199 .0123394 -3.81 0.000 -.0712047 -.0228351 other_inc | -.014015 .0009837 -14.25 0.000 -.0159431 -.0120869 male_educ | 0 (omitted) ------------------------------------------------------------------------------
You will see that the above command also includes "male_educ" in the list of exogenous variables, because
this is an explanatory variable for at least one equation in the model. However, this not being inlcuded in the
structural equation, has a no effect on \(P(y=1|x) \).
When we reached Stata 14.1. A change was introduced in how probabilities were calculated after ivprobit.
As it says in the "whatsnew" material, the new formulation would take into account endogeneity.
Specifically, they use what I call the 2sls predicted probabilities, following equation (5):
$$P(y^**1 >0|z_1,y_2,u_2) =\Phi \left(
z_1 \frac{\beta_1}{\sqrt{1-\rho^2}} +
y_2 \frac{\beta_2}{\sqrt{1-\rho^2}} +
\frac{\rho}{\sqrt{1-\rho^2}} \frac{u_2}{\sigma_2}
\right ) \quad (11)
$$
With the caveat \( u_2 \) was substituted by \( y_2 - z_1 \delta_1 - z_2 \):
$$P(y^{**}_1 >0|z_1,y_2)= P(y_1=1|z_1,y_2,u_2) =\Phi \left(
z_1 \frac{\beta_1}{\sqrt{1-\rho^2}} +
y_2 \frac{\beta_2}{\sqrt{1-\rho^2}} +
\frac{\rho}{\sqrt{1-\rho^2}} \frac{y_2 - z_1 \delta_1 - z_2 \delta_2}{\sigma_2}
\right ) \quad (12)
$$
While the two equations above are basically the same, they have important differences when marginal
effects are estimated by software.
So let me explain first what Stata 14.1, and 15 did.
To estimate marginal effects, partial derivatives were based on equation 12:
$$ \frac{\partial P(y=1|.)}{\partial z_1} =
\phi(.)*\left( \frac{\beta_1}{\sqrt{1-\rho^2}}
-\frac{\rho}{\sqrt{1-\rho^2}} * \frac{\delta_1}{\sigma_2} \right) $$
$$ \frac{\partial P(y=1|.)}{\partial z_2} =
\phi(.)*\left( 0
-\frac{\rho}{\sqrt{1-\rho^2}} * \frac{\delta_2}{\sigma_2} \right) $$
$$ \frac{\partial P(y=1|.)}{\partial y_2} =
\phi(.)*\left( \frac{\beta_2}{\sqrt{1-\rho^2}}
+\frac{1}{\sqrt{1-\rho^2}} * \frac{1}{\sigma_2} \right) $$
So, from the technical point of view, this partial derivatives are correct, since they are capturing the direct,
and indirect partial effects. A kind of total derivative, rather than partial derivatives.
The problem, however, is that this assumes we could actually observe how the unobserved component \( u_2 \) changes when other variables
change. Standard regression analysis, however, would say that these unobserved components should be considered as fixed, and instead one should
estimate marginal effects averaging over the unobserved factors. Thus, the second term on each one of the above derivatives should be zero.
Nevertheless, lets replicate this. First, if you are using Stata 16, the following behaivor may no longer be replicable:
. //. margins, dydx(*) predict(pr) . // . //Average marginal effects Number of obs = 500 . //Model VCE : OIM . // . //Expression : Probability of positive outcome, predict(pr) . //dy/dx w.r.t. : other_inc fem_educ kids male_educ . // . //------------------------------------------------------------------------------ . // | Delta-method . // | dy/dx Std. Err. z P>|z| [95% Conf. Interval] . //-------------+---------------------------------------------------------------- . // other_inc | -.0097802 .0014994 -6.52 0.000 -.012719 -.0068414 . // fem_educ | .0623273 .007099 8.78 0.000 .0484135 .076241 . // kids | -.0614265 .0139446 -4.41 0.000 -.0887574 -.0340956 . // male_educ | -.0194406 .0022103 -8.80 0.000 -.0237728 -.0151084 . //------------------------------------------------------------------------------
If I would like to replicate this using myivprobit, I would estimate marginal effects using option "pr1", and requesting derivates to be estimated without the "chain" option. This makes sure that one takes into account the effect of all changes in \( y_2 , z_1 \) on the outcome \( P(y=1|.) \). Not using this option would ignore, for example, the effect of \(y_2\) that comes through the error \(u_2\). (again, assuming this derivatives are correct).
. est restore myivp (results myivp are active now) . margins, dydx(*) predict(pr1) force nochain (note: prediction is a function of possibly stochastic quantities other than e(b)) Average marginal effects Number of obs = 500 Model VCE : OIM Expression : P(y=1|X) two-step, predict(pr1) dy/dx w.r.t. : fem_educ kids other_inc male_educ ------------------------------------------------------------------------------ | Delta-method | dy/dx Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- fem_educ | .0623273 .0060316 10.33 0.000 .0505055 .074149 kids | -.0614265 .0128383 -4.78 0.000 -.0865891 -.0362639 other_inc | -.0097802 .0009828 -9.95 0.000 -.0117065 -.0078539 male_educ | -.0194406 .0074886 -2.60 0.009 -.0341181 -.0047631 ------------------------------------------------------------------------------So what do we see here? First, that the Stata 14, 15, and early 16 versions were estimating a partial effect that inadvertently accounted for a second-order effect (through the first stage regression). This would normally have a small effect on the exogenous variables but will have a noticeable effect on the endogenous variable of interest.
. * two step procedure . reg $y2 $z1 $z2 Source | SS df MS Number of obs = 500 -------------+---------------------------------- F(3, 496) = 34.36 Model | 28864.2732 3 9621.42439 Prob > F = 0.0000 Residual | 138881.269 496 280.002558 R-squared = 0.1721 -------------+---------------------------------- Adj R-squared = 0.1671 Total | 167745.542 499 336.163411 Root MSE = 16.733 ------------------------------------------------------------------------------ other_inc | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- fem_educ | .3351866 .2837344 1.18 0.238 -.2222829 .8926562 kids | .8329056 .5497701 1.52 0.130 -.2472597 1.913071 male_educ | 2.845253 .2838838 10.02 0.000 2.28749 3.403016 _cons | 9.872562 5.049432 1.96 0.051 -.0483506 19.79347 ------------------------------------------------------------------------------ . predict double u2, resid . probit $y1 $z1 $y2 u2 Iteration 0: log likelihood = -344.63508 Iteration 1: log likelihood = -252.10819 Iteration 2: log likelihood = -252.04529 Iteration 3: log likelihood = -252.04529 Probit regression Number of obs = 500 LR chi2(4) = 185.18 Prob > chi2 = 0.0000 Log likelihood = -252.04529 Pseudo R2 = 0.2687 ------------------------------------------------------------------------------ fem_work | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- fem_educ | .227437 .0273171 8.33 0.000 .1738964 .2809775 kids | -.1961748 .0478084 -4.10 0.000 -.2898776 -.102472 other_inc | -.058473 .0090228 -6.48 0.000 -.0761573 -.0407887 u2 | .0240492 .0094295 2.55 0.011 .0055677 .0425306 _cons | .3956061 .4784993 0.83 0.408 -.5422352 1.333448 ------------------------------------------------------------------------------ . margins, dydx(*) predict(pr) Average marginal effects Number of obs = 500 Model VCE : OIM Expression : Pr(fem_work), predict(pr) dy/dx w.r.t. : fem_educ kids other_inc u2 ------------------------------------------------------------------------------ | Delta-method | dy/dx Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- fem_educ | .0646175 .0060271 10.72 0.000 .0528045 .0764304 kids | -.0557355 .0129302 -4.31 0.000 -.0810782 -.0303929 other_inc | -.0166128 .0022499 -7.38 0.000 -.0210225 -.0122032 u2 | .0068326 .002632 2.60 0.009 .0016741 .0119912 ------------------------------------------------------------------------------Standard errors would not be corrrect, but bootstrap could be applied to obtain corrected standard errors.
. ** built-in command . . qui:ivprobit $y1 $z1 ($y2 = $z2), ml . margins, dydx(*) predict(pr) Average marginal effects Number of obs = 500 Model VCE : OIM Expression : Average structural function probabilities, predict(pr) dy/dx w.r.t. : other_inc fem_educ kids ------------------------------------------------------------------------------ | Delta-method | dy/dx Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- other_inc | -.0166128 .0012889 -12.89 0.000 -.019139 -.0140867 fem_educ | .0646175 .0073529 8.79 0.000 .050206 .079029 kids | -.0557355 .0144233 -3.86 0.000 -.0840047 -.0274664 ------------------------------------------------------------------------------
So you can see that the two-step approach and the built-in approach
now provide the same marginal effects. And since the official command
estimates all coefficients simultaneously, the standard errors can be taken as correct...Let me get back to this point later.
So how can we correct for this with our predict program. Since there is nothing
to prevent margins to obtain numerical derivative across both equations, we need to
modify slighly how the model is estimated. First, we create clone copies of all
variables that enter the second stage: z_1 and y_2, and use them for the model estimation:
. clonevar c_other_inc = other_inc . clonevar c_fem_educ = fem_educ . clonevar c_kids = kids . global y2b c_other_inc . global z1b c_fem_educ c_kids . . ml model lf myivprobit_2sls ($y1 = $z1 $y2 ) (theta:) ($y2b = $z1b $z2 ) (lnsigma:) , /// > technique(nr bhhh) init(lnsigma:_cons = 2.81 ) maximize nolog . adde local predict myivprobit_p . est sto myivp
The idea of using "clones" of the exogenous variables \( z_1 \) and endogenous \( y_2 \)
is to have access to the same information as the original data,
but making sure \( \hat{u}_2 \) does not change when the original data changes.
Marginal effects can be calculated a I did before, except that I now make it explicit to request marginal effects
with respect to \(z_1 \quad \& \quad y_2 \) only.
. est restore myivp (results myivp are active now) . margins, dydx($z1 $y2) predict(pr1) force Average marginal effects Number of obs = 500 Model VCE : OIM Expression : P(y=1|X) two-step, predict(pr1) dy/dx w.r.t. : fem_educ kids other_inc ------------------------------------------------------------------------------ | Delta-method | dy/dx Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- fem_educ | .0646175 .006331 10.21 0.000 .0522089 .0770261 kids | -.0557355 .0134693 -4.14 0.000 -.0821349 -.0293362 other_inc | -.0166128 .0023499 -7.07 0.000 -.0212185 -.0120072 ------------------------------------------------------------------------------
And done. we have been able to reproduce the second version, two-step, marginal effects
for the instrumental variable probit model, that follows the two-step approach advocated by Prof. Wooldridge.
Furthermore, this also reproduces what ivprobit currently does!
There is only one last perky detail. If you look at the marginal effect
standard errors I produce with the myivprobit command, and compare it with the marginal effects
the ivprobit command produces, you will notice they are different.
I'm still uncertain why this may be happening, and I'm currently discussing with people at Stata about the "correct" way to obtain standard errors, but here are the cliff notes:
Recall that IVprobit uses the following formula to estimate marginal effects with respect to \(y_2 \) is:
$$ \frac{\partial P(y=1|.)}{\partial y_2} =
\phi \left(z_1 \frac{\beta_1}{\sqrt{1-\rho^2}} +
y_2 \frac{\beta_2}{\sqrt{1-\rho^2}} +
\frac{\rho}{\sqrt{1-\rho^2}} \frac{\hat{u}_2}{\sigma_2} \right)* \left( \frac{\beta_2}{\sqrt{1-\rho^2}} \right) $$
My own version of predict uses the same formula, but modified for the two-step approach.
If one needs to estimate standard errors for these partial effects, it is necessary to account for the uncertainty of all
estimated coefficients in the model, including the uncertainly of the errors \( \hat{u}_2 \).
Unofficial words (still under discussion) indictates that the current formulation assumes \( \rho \) and \( \sigma_2 \) to be constant,
when standard errors are calculated.
While this may seem incorrect, I understand the intuition behind this idea.
If you recall the estimation of marginal effects from the structural equation:
$$\frac{\partial P(y_1=1|.)}{\partial y_2} = \phi ( z_1 \beta_1 + y_2 \beta_2 ) \beta_2 $$
you will see that this are not affected by \( \rho \) or \( \sigma_2 \). Perhaps this is one of the reasons why the currently
estimated marginal effects standard errors are so similar to the ones based on the "old" structural marginal effects.
My own command, however, accounts for the uncertainty in \( \rho \) and \( \sigma_2 \). This also seems correct
since two-step marginal procedures are expected to be less efficient than the Fullinformation counterparts.
Of course, if you prefer to have a tie-breaker on which one is correct, I can use a Bootstrap procedure to produce
the elusive standard errors.
Basically, I'll use the manual two-step procedure, alomg with a 250 bootstrap repetitions, to report the results:
. program bs_ivprobit, eclass 1. reg $y2 $z1 $z2 2. capture drop u2 3. predict double u2, resid 4. probit $y1 $z1 $y2 u2 5. margins, dydx(*) predict(pr) nose post 6. end . bootstrap , reps(250) seed(1) nodots:bs_ivprobit Average marginal effects Number of obs = 500 Replications = 250 ------------------------------------------------------------------------------ | Observed Bootstrap Normal-based | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- fem_educ | .0646175 .0058983 10.96 0.000 .053057 .076178 kids | -.0557355 .0138358 -4.03 0.000 -.0828532 -.0286179 other_inc | -.0166128 .0023778 -6.99 0.000 -.0212732 -.0119525 u2 | .0068326 .0027569 2.48 0.013 .0014292 .0122361 ------------------------------------------------------------------------------
In this case, it seems that the bootstrap estimates seem to favor my version of marginal effects and standard errors. However, you are free to take this as limited evidence, and use either strategy as your own.
The command -ml- is a powerful tool that can be used to estimate single or multiple equation models, as long as
the loglikelihood functions (and their inter-relations) can be properly defined.
-margins- is also a very flexible command that can be easily combined with -ml- to expand the estimation of marginal effects for properly defined outcomes. While the command is flexible and relatively easy to use, these properties can also be double+edge swords, if one is not aware of the mechanics behind the actual estimation of partial effects.
In my view, the original estimation of marginal effects after iv-probit was correct, but the changes
it received in Stata 14.1 introduced what we could call a bug, that was based on solid Math. However, unless you dig deeper into what ivprobit tries to estimate, it would be difficult to say why that change produced undesirable results.
While the last update on Stata 16 has made a correction that now produces correct partial effects (two-step like),
I still believe there is a remaining bug in the program that may be affecting the estimation of standard errors.
And of course, margins could also go back to the beginning, and provide users the option for the estimation of MLE-structural marginal effects.
Who knows, this may be going to the Statalist wishlist...
Today's development! As of April 1st, 2021, the way of how IVprobit works has changed for the estimation of standard errors!
If you happen to read my post before, (or what you can see above), when I wrote it i made the case that Standard errors for IVprobit were
not being estimated correctly, because they didn't account for errors on some of the structural parameters (rho and sigma). However, Today (April 1, 2021), Stata has pushed an update
fixing this!. On the one side, the code above will no longer be reproducible. On the brighter side, it seems my hunch was right!
And now, standard errors for IVprobit are estimated the right way! The circle is complete!
So, just for closure, the example today:
. webuse laborsup . global y1 fem_work . global z1 fem_educ kids . global y2 other_inc . global z2 male_educ . qui: ivprobit $y1 $z1 ($y2 = $z2), ml . margins, dydx(*) predict(pr) Average marginal effects Number of obs = 500 Model VCE : OIM Expression : Average structural function probabilities, predict(pr) dy/dx w.r.t. : other_inc fem_educ kids ------------------------------------------------------------------------------ | Delta-method | dy/dx Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- other_inc | -.0166128 .0023501 -7.07 0.000 -.0212189 -.0120068 fem_educ | .0646175 .0063309 10.21 0.000 .0522091 .0770258 kids | -.0557355 .0134692 -4.14 0.000 -.0821347 -.0293364 ------------------------------------------------------------------------------